feat: add DSA cache and PP support by zhjc1124 · Pull Request #134 · taco-project/FlexKV

zhjc1124 · 2026-04-02T08:49:09Z

Summary

This PR adds DSA (Dynamic Sparse Attention) cache support and Pipeline Parallelism (PP) support to FlexKV.

Added dataclass in to hold indexer-specific cache configuration (e.g., , , ) for DSA/NSA sparse attention models
Extended to manage separate indexer storage handles () for CPU, SSD, and REMOTE devices
Extended to accept optional indexer GPU blocks

Added parameter to so each PP rank only manages its own layers instead of the full model layer count
Fixed resolution to use total heads (not per-rank heads) for correct KV layout

YconquestY self-requested a review April 2, 2026 08:57

zhjc1124 force-pushed the feat/layerwise_rebase branch 8 times, most recently from 619f422 to d5033a7 Compare April 5, 2026 04:30

feat: add DSA cache and PP support

57045c3

zhjc1124 force-pushed the feat/layerwise_rebase branch from f8ca05d to 57045c3 Compare April 5, 2026 09:08

feiqiangs self-requested a review April 6, 2026 02:41

feiqiangs merged commit 72c5187 into taco-project:feat/layerwise_rebase Apr 6, 2026

feiqiangs pushed a commit that referenced this pull request Apr 6, 2026

Integrate PR #134: add DSA cache and PP support

cbc0281